Goto

Collaborating Authors

 predictive power


6cfe0e6127fa25df2a0ef2ae1067d915-Paper.pdf

Neural Information Processing Systems

However,maximum-marginclassifiers areinherently robusttoperturbations ofdata at prediction time, and this implication is at odds with concrete evidence that neural networks, in practice, are brittle toadversarial examples [71]and distribution shifts [52,58,44,65]. Hence, the linear setting, while convenient to analyze, is insufficient to capture the non-robustness of neural networkstrainedonrealdatasets.Goingbeyondthelinearsetting,severalworks[ 1,49,74]arguethat neuralnetworksgeneralize wellbecause standard training procedures haveabiastowardslearning




UnderstandingGlobalFeatureContributionsWith AdditiveImportanceMeasures

Neural Information Processing Systems

Most recent research hasaddressed thisby focusing onlocal interpretability, which explains a model's individual predictions (e.g., the role of each feature in a patient's diagnosis) [25, 30, 34, 38]. Twospecial cases areS = andS = D, which respectively correspond to the mean prediction f (x ) = E[f(X)] and the full model predictionfD(x) = f(x).




Explanations that reveal all through the definition of encoding

Neural Information Processing Systems

Feature attributions attempt to highlight what inputs drive predictive power. Good attributions or explanations are thus those that produce inputs that retain this predictive power; accordingly, evaluations of explanations score their quality of prediction. However, evaluations produce scores better than what appears possible from the values in the explanation for a class of explanations, called encoding explanations. Probing for encoding remains a challenge because there is no general characterization of what gives the extra predictive power. We develop a definition of encoding that identifies this extra predictive power via conditional dependence and show that the definition fits existing examples of encoding. This definition implies, in contrast to encoding explanations, that non-encoding explanations contain all the informative inputs used to produce the explanation, giving them a "what you see is what you get" property, which makes them transparent and simple to use. Next, we prove that existing scores (ROAR, FRESH, EVAL-X) do not rank non-encoding explanations above encoding ones, and develop STRIPE-X which ranks them correctly. After empirically demonstrating the theoretical insights, we use STRIPE-X to show that despite prompting an LLM to produce non-encoding explanations for a sentiment analysis task, the LLM-generated explanations encode.


Deep Ensembles Work, But Are They Necessary?

Neural Information Processing Systems

Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's ability to detect out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and - in this sense - is not indicative of any effective robustness. While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model.


Understanding Global Feature Contributions With Additive Importance Measures

Neural Information Processing Systems

Understanding the inner workings of complex machine learning models is a long-standing problem and most recent research has focused on local interpretability. To assess the role of individual input features in a global sense, we explore the perspective of defining feature importance through the predictive power associated with each feature. We introduce two notions of predictive power (model-based and universal) and formalize this approach with a framework of additive importance measures, which unifies numerous methods in the literature. We then propose SAGE, a model-agnostic method that quantifies predictive power while accounting for feature interactions. Our experiments show that SAGE can be calculated efficiently and that it assigns more accurate importance values than other methods.


How does a Neural Network's Architecture Impact its Robustness to Noisy Labels?

Neural Information Processing Systems

Noisy labels are inevitable in large real-world datasets. In this work, we explore an area understudied by previous works --- how the network's architecture impacts its robustness to noisy labels. We provide a formal framework connecting the robustness of a network to the alignments between its architecture and target/noise functions. Our framework measures a network's robustness via the predictive power in its representations --- the test performance of a linear model trained on the learned representations using a small set of clean labels. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise. To support our hypothesis, we provide both theoretical and empirical evidence across various neural network architectures and different domains. We also find that when the network is well-aligned with the target function, its predictive power in representations could improve upon state-of-the-art (SOTA) noisy-label-training methods in terms of test accuracy and even outperform sophisticated methods that use clean labels.